Through this project I had the opportunity to become more familiar with working with graphs when searching for specific information. It was very interesting for me to study the history of the use of various names, since it gives us an understanding of how many people with a specific name were born in certain years and how old they might be now. In the example with question 2 about the name Brittany, I was interested in how much our association with name and age coincides with reality.
Read and format project data
# Learn more about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html# Include and execute your code heredf_names = pd.read_csv("https://raw.githubusercontent.com/byuidatascience/data4names/master/data-raw/names_year/names_year.csv")
CQ1: Comparing Name “Anastasia” Popularity Over Time
How does your name at your birth year compare to its use historically?
Until the 1980s, the popularity of this name was 0-200 children per year, which is quite small, considering that most of the time this figure was below 100. Since the 1980s, active use of this name began. In less than 10 years, its popularity has quadrupled. By the year of my birth (1998), almost 1.5 times. At the moment, this indicator continues to grow.
Code - Filter data by name Anastasia and draw a graph with point birth year (1998)
# Filter data by named_name = df_names.query('name == "Anastasia"')# Draw a graphchart = ggplot(d_name, aes('year', 'Total')) +\ geom_line(color='blue') +\ geom_area(fill='lightblue', alpha=0.6) +\ geom_vline(xintercept=1998, linetype='dashed', color='green', size=1) +\ scale_x_continuous(format="d") +\ ggtitle(f'History of use of the name Anastasia') +\ labs(x='Year', y='People born with the name Anastasia')chart
A chart showing the history of the name Anastasia before, on and after 1998
CQ2: Age Estimation Based on Name Britany
If you talked to someone named Brittany on the phone, what is your guess of their age? What ages would you not guess?
For me, the name Brittany is associated with a young woman of about 20. I definitely would not have guessed that she could be 45 or older if the voice itself did not indicate otherwise. However, according to available data, most people with this name are 33-34 years old today.
Code - Filter data by name Brittany and draw a graph
# Filter data by named_name = df_names.query('name == "Brittany"')# Draw a graphchart = ggplot(d_name, aes('year', 'Total')) +\ geom_line(color='blue') +\ geom_area(fill='lightblue', alpha=0.6) +\ scale_x_continuous(format="d") +\ ggtitle(f'History of use of the name Brittany') +\ labs(x='Year', y='People born with the name Brittany')chart
A chart showing the history of the name Brittany
CQ3: Comparative Analysis of Christian Names from 1920 to 2000
Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names.
Prior to 1970, the most popular name was Mary. It was given to their children 2-4 times more often than the name Paul, which is in second place. An interesting point is that from 1970 to 2000 the use of the names Mary and Paul is almost the same. Martha and Peter are quite rare names for those years from 19021 to 2000. However, it is interesting to note that initially the name Martha was preferable, but after 1950 the situation changed, so that the name Peter became more popular. By 2000, the general usage of all four names was nearly the same.
Code - Filter data by Christian names and year (1920-2000) and draw a graph
# Filter data by name and yeard_name = df_names.query('name in ["Mary", "Martha", "Peter", "Paul"] & year >=1920 & year <=2000')# Draw a graphchart = ggplot(d_name, aes('year', 'Total')) +\ geom_line(aes(color='name'), size=1) +\ scale_x_continuous(format="d") +\ ggtitle(f'History of use of the Christian names') +\ labs( x='Year', y='People born with the Christian names', color='Christian names')chart
A chart showing the history of the names between 1920 and 2000 years
CQ4: Impact of a Famous Movie on Name Usage
Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?
I analyzed data on the name Dominic from the movie “Fast and the Furious”. According to the data, before the release of the first film in 2001 year, the name was already growing in popularity. After the first film, we see an increase in the use of this name. However, after each of the next three releases in 2003, 2006 and 2009 years there was a sharp decline in the use of this name. Although by the fifth film in 2011 year the total number exceeds the usage before the first. After the fifth film, there is again an increase in the use of the name Dominic and after the 6th in 2013 year it declines. Data after 2015 are not available.
Code - Filter data by name Dominic and draw a graph
# Filter data by name and yeard_name = df_names.query('name == "Dominic"')# Draw a graphchart = ggplot(d_name, aes('year', 'Total')) +\ geom_line(color='blue') +\ geom_area(fill='lightblue', alpha=0.6) +\ geom_vline(xintercept=2001, linetype='dashed', color='red') +\ geom_vline(xintercept=2003, linetype='dashed', color='gray') +\ geom_vline(xintercept=2006, linetype='dashed', color='green') +\ geom_vline(xintercept=2009, linetype='dashed', color='black') +\ geom_vline(xintercept=2011, linetype='dashed', color='orange') +\ geom_vline(xintercept=2013, linetype='dashed', color='#800080') +\ geom_vline(xintercept=2015, linetype='dashed', color='pink') +\ scale_x_continuous(format="d") +\ ggtitle(f'History of use of the name Dominic') +\ labs(x='Year', y='People born with the name Dominic')chart
A chart showing the history of the name Dominic before the first ‘Fast and Furious’ release and throughout all releases
ST1: Reproduction of “Elliot” Name Chart
Reproduce the chart Elliot using the data from the names_year.csv file.
According to the data, we can see the history of the usage of Elliot name and E.T Releases: first in the middle of 1982 year, second in the 1985 year and third in 2002 year.
Code - Filter data by name Elliot and year (>=1950) and draw a graph
# Filter data by name and yeard_name = df_names.query('name == "Elliot" & year >=1950')# List of yearsyears = [ {'year': 1982.5,'title': 'E. T. Released','position': 1976}, {'year': 1985,'title': 'Second Release','position': 1992}, {'year': 2002,'title': 'Third Release','position': 2008}]# Draw a graphchart = ggplot(d_name, aes('year', 'Total')) +\ geom_line(color='blue') +\ geom_area(fill='lightblue', alpha=0.6) +\ scale_x_continuous(format="d") +\ scale_y_continuous(format="d") +\ ggtitle(f'Elliot... What?') +\ labs(x='Year', y='Total')for year in years: chart += geom_vline(xintercept=year['year'], linetype='dashed', color='red') +\ geom_text(label=year['title'], x=year['position'], y=d_name['Total'].max(), color='black')chart
Filter data by name Elliot from 1950 year and draw a graph with its three releases